home *** CD-ROM | disk | FTP | other *** search
- 0. Improved efficiency.
-
- * Parse and output array initializers an element at a time, freeing
- storage after each, instead of parsing the whole initializer first and
- then outputting. This would reduce memory usage for large
- initializers.
-
- 1. Better optimization.
-
- * Constants in unused inline functions
-
- It would be nice to delay output of string constants so that string
- constants mentioned in unused inline functions are never generated.
- Perhaps this would also take care of string constants in dead code.
-
- The difficulty is in finding a clean way for the RTL which refers
- to the constant (currently, only by an assembler symbol name)
- to point to the constant and cause it to be output.
-
- * More cse
-
- The techniques for doing full global cse are described in the red
- dragon book, or (a different version) in Frederick Chow's thesis from
- Stanford. It is likely to be slow and use a lot of memory, but it
- might be worth offering as an additional option.
-
- It is probably possible to extend cse to a few very frequent cases
- without so much expense.
-
- For example, it is not very hard to handle cse through if-then
- statements with no else clauses. Here's how to do it. On reaching a
- label, notice that the label's use-count is 1 and that the last
- preceding jump jumps conditionally to this label. Now you know it
- is a simple if-then statement. Remove from the hash table
- all the expressions that were entered since that jump insn
- and you can continue with cse.
-
- It is probably not hard to handle cse from the end of a loop
- around to the beginning, and a few loops would be greatly sped
- up by this.
-
- * Support more general tail-recursion among different functions.
-
- This might be possible under certain circumstances, such as when
- the argument lists of the functions have the same lengths.
- Perhaps it could be done with a special declaration.
-
- You would need to verify in the calling function that it does not
- use the addresses of any local variables and does not use setjmp.
-
- * Put short statics vars at low addresses and use short addressing mode?
-
- Useful on the 68000/68020 and perhaps on the 32000 series,
- provided one has a linker that works with the feature.
- This is said to make a 15% speedup on the 68000.
-
- * Keep global variables in registers.
-
- Here is a scheme for doing this. A global variable, or a local variable
- whose address is taken, can be kept in a register for an entire function
- if it does not use non-constant memory addresses and (for globals only)
- does not call other functions. If the entire function does not meet
- this criterion, a loop may.
-
- The VAR_DECL for such a variable would have to have two RTL expressions:
- the true home in memory, and the pseudo-register used temporarily.
- It is necessary to emit insns to copy the memory location into the
- pseudo-register at the beginning of the function or loop, and perhaps
- back out at the end. These insns should have REG_EQUIV notes so that,
- if the pseudo-register does not get a hard register, it is spilled into
- the memory location which exists in any case.
-
- The easiest way to set up these insns is to modify the routine
- put_var_into_stack so that it does not apply to the entire function
- (sparing any loops which contain nothing dangerous) and to call it at
- the end of the function regardless of where in the function the
- address of a local variable is taken. It would be called
- unconditionally at the end of the function for all relevant global
- variables.
-
- For debugger output, the thing to do is to invent a new binding level
- around the appropriate loop and define the variable name as a register
- variable with that scope.
-
- * Live-range splitting.
-
- Currently a variable is allocated a hard register either for the full
- extent of its use or not at all. Sometimes it would be good to
- allocate a variable a hard register for just part of a function; for
- example, through a particular loop where the variable is mostly used,
- or outside of a particular loop where the variable is not used. (The
- latter is nice because it might let the variable be in a register most
- of the time even though the loop needs all the registers.)
-
- It might not be very hard to do this in global-alloc.c when a variable
- fails to get a hard register for its entire life span.
-
- The first step is to find a loop in which the variable is live, but
- which is not the whole life span or nearly so. It's probably best to
- use a loop in which the variable is heavily used.
-
- Then create a new pseudo-register to represent the variable in that loop.
- Substitute this for the old pseudo-register there, and insert move insns
- to copy between the two at the loop entry and all exits. (When several
- such moves are inserted at the same place, some new feature should be
- added to say that none of those registers conflict merely because of
- overlap between the new moves. And the reload pass should reorder them
- so that a store precedes a load, for any given hard register.)
-
- After doing this for all the reasonable candidates, run global-alloc
- over again. With luck, one of the two pseudo-registers will be fit
- somewhere. It may even have a much higher priority due to its reduced
- life span.
-
- There will be no room in general for the new pseudo-registers in
- basic_block_live_at_start, so there will need to be a second such
- matrix exclusively for the new ones. Various other vectors indexed by
- register number will have to be made bigger, or there will have to be
- secondary extender vectors just for global-alloc.
-
- A simple new feature could arrange that both pseudo-registers get the
- same stack slot if they both fail to get hard registers.
-
- Other compilers split live ranges when they are not connected, or
- try to split off pieces `at the edge'. I think splitting around loops
- will provide more speedup.
-
- Creating a fake binding block and a new like-named variable with
- shorter life span and different address might succeed in describing
- this technique for the debugger.
-
- * Detect dead stores into memory?
-
- A store into memory is dead if it is followed by another store into
- the same location; and, in between, there is no reference to anything
- that might be that location (including no reference to a variable
- address).
-
- * Loop optimization.
-
- Strength reduction and iteration variable elimination could be
- smarter. They should know how to decide which iteration variables are
- not worth making explicit because they can be computed as part of an
- address calculation. Based on this information, they should decide
- when it is desirable to eliminate one iteration variable and create
- another in its place.
-
- It should be possible to compute what the value of an iteration
- variable will be at the end of the loop, and eliminate the variable
- within the loop by computing that value at the loop end.
-
- When a loop has a simple increment that adds 1,
- instead of jumping in after the increment,
- decrement the loop count and jump to the increment.
- This allows aob insns to be used.
-
- * Using constraints on values.
-
- Many operations could be simplified based on knowledge of the
- minimum and maximum possible values of a register at any particular time.
- These limits could come from the data types in the tree, via rtl generation,
- or they can be deduced from operations that are performed. For example,
- the result of an `and' operation one of whose operands is 7 must be in
- the range 0 to 7. Compare instructions also tell something about the
- possible values of the operand, in the code beyond the test.
-
- Value constraints can be used to determine the results of a further
- comparison. They can also indicate that certain `and' operations are
- redundant. Constraints might permit a decrement and branch
- instruction that checks zeroness to be used when the user has
- specified to exit if negative.
-
- * Smarter reload pass.
-
- The reload pass as currently written can reload values only into registers
- that are reserved for reloading. This means that in order to use a
- register for reloading it must spill everything out of that register.
-
- It would be straightforward, though complicated, for reload1.c to keep
- track, during its scan, of which hard registers were available at each
- point in the function, and use for reloading even registers that were
- free only at the point they were needed. This would avoid much spilling
- and make better code.
-
- * Change the type of a variable.
-
- Sometimes a variable is declared as `int', it is assigned only once
- from a value of type `char', and then it is used only by comparison
- against constants. On many machines, better code would result if
- the variable had type `char'. If the compiler could detect this
- case, it could change the declaration of the variable and change
- all the places that use it.
-
- * Order of subexpressions.
-
- It might be possible to make better code by paying attention
- to the order in which to generate code for subexpressions of an expression.
-
- * More code motion.
-
- Consider hoisting common code up past conditional branches or
- tablejumps.
-
- * Trace scheduling.
-
- This technique is said to be able to figure out which way a jump
- will usually go, and rearrange the code to make that path the
- faster one.
-
- * Distributive law.
-
- The C expression *(X + 4 * (Y + C)) compiles better on certain
- machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
- modes. It may be tricky to determine when, and for which machines, to
- use each alternative.
-
- Some work has been done on this, in combine.c.
-
- * Jump-execute-next.
-
- Many recent machines have jumps which optionally execute the following
- instruction before the instruction jumped to, either conditionally or
- unconditionally. To take advantage of this capability requires a new
- compiler pass that would reorder instructions when possible. After
- reload may be a good place for it.
-
- On some machines, the result of a load from memory is not available
- until after the following instruction. The easiest way to support
- these machines is to output each RTL load instruction as two assembler
- instructions, the second being a no-op. Putting useful instructions
- after the load instructions may be a similar task to putting them
- after jump instructions.
-
- * Pipeline scheduling.
-
- On many machines, code gets faster if instructions are reordered
- so that pipelines are kept full. Doing the best possible job of this
- requires knowing which functional units each kind of instruction executes
- in and how long the functional unit stays busy with it. Then the
- goal is to reorder the instructions to keep many functional units
- busy but never feed them so fast they must wait.
-
- * Can optimize by changing if (x) y; else z; into z; if (x) y;
- if z and x do not interfere and z has no effects not undone by y.
- This is desirable if z is faster than jumping.
-
- * For a two-insn loop on the 68020, such as
- foo: movb a2@+,a3@+
- jne foo
- it is better to insert dbeq d0,foo before the jne.
- d0 can be a junk register. The challenge is to fit this into
- a portable framework: when can you detect this situation and
- still be able to allocate a junk register?
-
- * For the 80387 floating point, perhaps it would be possible to use 3
- or 4 registers in the stack to hold register variables. (It would be
- necessary to keep track of how those slots move in the stack as other
- pushes and pops are done.) This is probably very tricky, but if
- you are a GCC wizard and you care about the speed of floating point on
- an 80386, you might want to work on it.
-
- 2. Simpler porting.
-
- Right now, describing the target machine's instructions is done
- cleanly, but describing its addressing mode is done with several
- ad-hoc macro definitions. Porting would be much easier if there were
- an RTL description for addressing modes like that for instructions.
- Tools analogous to genflags and genrecog would generate macros from
- this description.
-
- There would be one pattern in the address-description file for each
- kind of addressing, and this pattern would have:
-
- * the RTL expression for the address
- * C code to verify its validity (since that may depend on
- the exact data).
- * C code to print the address in assembler language.
- * C code to convert the address into a valid one, if it is not valid.
- (This would replace LEGITIMIZE_ADDRESS).
- * Register constraints for all indeterminates that appear
- in the RTL expression.
-
- 3. Other languages.
-
- Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
- desirable.
-
- Pascal, Modula-2 and Ada require the implementation of functions
- within functions. Some of the mechanisms for this already exist.
-
- 4. More extensions.
-
- * Label-addresses as expressions.
-
- It would be nice to have access to the addresses of labels; to be able to
- store them in variables, or initialize vectors of them.
-
- Alas, `&label0' is the address of the variable named label0, which is
- unrelated to the label with that name. Some other syntax is needed.
- Perhaps colon as a unary operator? That is ambiguous with `?:' with
- the middle operand omitted. Perhaps ^ as a unary operator? Perhaps
- `__label__ label0' could mean the value of label0? Its type could be
- `void *'. `goto *EXP' could be used to go to a value of type `void
- *'--no ambiguity there.
-
- Jump optimization and flow analysis must know about computed jumps,
- but that is not hard. Each basic block headed by a possible target of
- computed jumps must be considered a successor of each basic block
- ending in a computed jump. Aside from this, I believe no other
- optimizer changes are needed.
-
- Next question: stack levels. In most functions, there is no problem,
- but it would be a shame to make a feature that doesn't work together
- with other features. Here is an idea:
-
- For each label that might need stack level restoration, construct a
- shadow-label which will restore the stack and jump to the user-label.
- Then use the address of the shadow label for label0 when someone asks
- for that of label0. Jump optimization will delete all the shadow labels
- if the function has no computed gotos.
-
- * Block structure for labels.
-
- The ({...}) construct should serve as a lexical block for label names,
- so that the same label may be defined both inside and outside of it.
- Then macro definitions could use labels internally safely, by enclosing
- the label and the goto in one of these constructs.
-
- * Generated unique labels. Have some way of generating distinct labels
- for use in extended asm statements. I don't know what a good syntax would
- be.
-
- 5. Generalize the machine model.
-
- * Some new compiler features may be needed to do a good job on machines
- where static data needs to be addressed using base registers.
-
- * Some machines have two stacks in different areas of memory, one used
- for scalars and another for large objects. The compiler does not
- now have a way to understand this.
-
- 6. Precompilation of header files.
-
- In the future, many programs will use thousands of lines of header files.
- Compiling the headers might be slower than compiling the guts of any one
- source file. Here is a scheme for precompiling header files to make
- compilation faster for a sequence of headers which is often used.
-
- A precompiled header corresponds to a sequence of header files. The
- preprocessor recognizes when the input starts with a sequence of
- `#include' commands and searches a data base for a precompiled header
- corresponding to that sequence. The modtimes of all these files are
- stored in the data base so that one can tell whether the precompiled
- header is still valid.
-
- For robustness, each directory should have its own collection of
- precompiled headers and its own data base of them. Probably each
- precompiled header would be a file and the data base would be one
- more file.
-
- The data base records the entire collection of predefined macros and
- their definitions, except for __FILE__, __LINE__ and __DATE__, for
- each precompiled header. If this collection does not match the setup
- at the start of the current compilation (including the results of -D
- and -U switches), the precompiled header is inapplicable. It might
- be possible to have distinct precompiled headers for the same sequence
- of header files but different collections of predefined macros.
-
- The state of any option that affects macro processing, such as -ansi
- or -traditional, must also be recorded, and the precompiled header is
- valid only if these options match.
-
- The precompiled header contains an ordered series of strings. Some
- strings are marked "unconditional"; these must be compiled each time
- the precompiled header is used. Other strings are have keys, which
- are identifiers. A string with keys must be compiled if at least one
- of its keys is mentioned in the input. The order these strings appear
- in the precompiled header is called their intrinsic order.
-
- The C preprocessor reads in the precompiled header file and scan all
- the strings, making for each key an entry in the same symbol table
- used for macros, pointing at a list of all the strings for which it is
- a key. Each string must have a flag (one flag per string, not one per
- key per string). The same code in `rescan' that detects references to
- macros would detect a reference to a key and flag all of the strings
- that it belongs to as needing to be output.
-
- Each of these strings is immediately recursively macroexpanded (i.e.
- `rescan' is called), but the output from this is discarded. The
- expansion is to detect any other keys mentioned in the string, and to
- define any macros for which the string contains a #define. The key's
- symbol table entry is be deleted to save time if the key is
- encountered again, and to avoid an infinite recursion.
-
- The unconditional strings are macroexpanded with `rescan' (but the
- output is discarded) at some time before anything is actually output.
-
- At the end of compilation, before any of the actual input text is
- output, the list of strings is scanned in the intrinsic order, and
- each string that is unconditional or flagged is output verbatim,
- except that any #define lines are discarded.
-
- Precompiled headers would be constructed by explicit request with a
- special tool. The first step is to run cpp on the sequence of header
- files' contents. This would use a new option that would cause all
- #define lines to be output unchanged as well as defining the macro.
- The second step is to divide the output into strings, some keyed and
- some unconditional. This division is done without changing the order
- of the text being divided up.
-
- JNC@lcs.mit.edu has some ideas on this subject also.
-
- 7. Better documentation of how GCC works and how to port it.
-
- Here is an outline proposed by Allan Adler.
-
- I. Overview of this document
- II. The machines on which GCC is implemented
- A. Prose description of those characteristics of target machines and
- their operating systems which are pertinent to the implementation
- of GCC.
- i. target machine characteristics
- ii. comparison of this system of machine characteristics with
- other systems of machine specification currently in use
- B. Tables of the characteristics of the target machines on which
- GCC is implemented.
- C. A priori restrictions on the values of characteristics of target
- machines, with special reference to those parts of the source code
- which entail those restrictions
- i. restrictions on individual characteristics
- ii. restrictions involving relations between various characteristics
- D. The use of GCC as a cross-compiler
- i. cross-compilation to existing machines
- ii. cross-compilation to non-existent machines
- E. Assumptions which are made regarding the target machine
- i. assumptions regarding the architecture of the target machine
- ii. assumptions regarding the operating system of the target machine
- iii. assumptions regarding software resident on the target machine
- iv. where in the source code these assumptions are in effect made
- III. A systematic approach to writing the files tm.h and xm.h
- A. Macros which require special care or skill
- B. Examples, with special reference to the underlying reasoning
- IV. A systematic approach to writing the machine description file md
- A. Minimal viable sets of insn descriptions
- B. Examples, with special reference to the underlying reasoning
- V. Uses of the file aux-output.c
- VI. Specification of what constitutes correct performance of an
- implementation of GCC
- A. The components of GCC
- B. The itinerary of a C program through GCC
- C. A system of benchmark programs
- D. What your RTL and assembler should look like with these benchmarks
- E. Fine tuning for speed and size of compiled code
- VII. A systematic procedure for debugging an implementation of GCC
- A. Use of GDB
- i. the macros in the file .gdbinit for GCC
- ii. obstacles to the use of GDB
- a. functions implemented as macros can't be called in GDB
- B. Debugging without GDB
- i. How to turn off the normal operation of GCC and access specific
- parts of GCC
- C. Debugging tools
- D. Debugging the parser
- i. how machine macros and insn definitions affect the parser
- E. Debugging the recognizer
- i. how machine macros and insn definitions affect the recognizer
-
- ditto for other components
-
- VIII. Data types used by GCC, with special reference to restrictions not
- specified in the formal definition of the data type
- IX. References to the literature for the algorithms used in GCC
-
-